Skip to content

Conversation

@rgarcia
Copy link
Contributor

@rgarcia rgarcia commented Jan 12, 2026

Summary

  • Replace HTTP connection hijacking for PTY attach with WebSocket-based protocol
  • Improves compatibility with load balancers, reverse proxies, and API gateways
  • Add plan document describing the migration and protocol design

Background

The previous HTTP hijack approach for /process/{process_id}/attach doesn't work well when there are multiple layers of proxying in front of this image. WebSockets provide a more robust solution for bidirectional streaming that is well-understood by intermediary infrastructure.

Protocol

The new WebSocket attach protocol:

Direction Message Type Content
Client → Server Binary Raw bytes for PTY stdin
Server → Client Binary Raw bytes from PTY stdout
Client → Server Text (JSON) Control messages like {"type": "resize", "rows": 24, "cols": 80}
Server → Client Text (JSON) Events like {"type": "exit", "exitCode": 0}

Changes

  • server/cmd/api/api/process.go: Add HandleProcessAttachWS WebSocket handler
  • server/cmd/api/main.go: Route attach endpoint to new WebSocket handler
  • plans/migrate-tty-attach-to-websocket.md: Design document

Test plan

  • All unit tests pass
  • Manual testing with WebSocket client
  • Test through proxy to verify WebSocket works with intermediaries

Checklist

  • A link to a related issue in our repository
  • A description of the changes proposed in the pull request.
  • @mentions of the person or team responsible for reviewing proposed changes.

Note

Implements a WebSocket-based PTY attach that is proxy-friendly and supports control framing.

  • Adds HandleProcessAttachWS using github.com/coder/websocket; streams stdin/stdout as binary and handles JSON control (resize, exit, error)
  • Serializes WS writes, coordinates graceful shutdown across PTY/WS/process, and emits exit code before close
  • Routes GET /process/{process_id}/attach to the new WS handler in main.go (legacy hijack handler no longer routed)
  • Introduces server/lib/ptyio with AttachControlMessage, message type enums, constants, and ReadPTYToWriter (poll-based PTY reader)
  • Returns JSON errors when attach fails (not found, not PTY, conflict)

Written by Cursor Bugbot for commit a185e48. This will update automatically on new commits. Configure here.

Replace the HTTP connection hijacking approach for PTY attach with a
WebSocket-based protocol that works better through proxy layers.

The new WebSocket protocol uses:
- Binary messages for stdin/stdout data
- JSON text messages for control (resize) and events (exit code)

This improves compatibility with load balancers, reverse proxies, and
API gateways that may not properly support HTTP connection hijacking.
Copy link
Contributor

@tembo tembo bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice migration away from HTTP hijack — the WS framing/control channel will be a lot easier to operate through proxies.

A few small nits from the new handler:

  • http.Error sends text/plain + newline, so the early error responses aren’t actually JSON.
  • Consider being deliberate about OriginPatterns: ["*"] (can become surprising if cookie-based auth ever shows up).
  • The exit event send currently relies on a Sleep(50ms); making the send deterministic would avoid occasional dropped exit messages.
  • Invalid resize messages are silently ignored; a warn-level log would help debug client behavior.

Comment on lines 781 to 783
if !ok {
http.Error(w, `{"type":"error","message":"process not found"}`, http.StatusNotFound)
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http.Error will respond as text/plain and also appends a newline, so these "JSON" error bodies won’t actually be JSON for clients.

Suggested change
if !ok {
http.Error(w, `{"type":"error","message":"process not found"}`, http.StatusNotFound)
return
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusNotFound)
_, _ = w.Write([]byte(`{"type":"error","message":"process not found"}`))
return


// Accept WebSocket connection
wsConn, err := websocket.Accept(w, r, &websocket.AcceptOptions{
OriginPatterns: []string{"*"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OriginPatterns: []string{"*"} makes it easy to accidentally allow cross-site WebSocket requests if auth is cookie-based (CSWSH-style issues). If this endpoint is meant to be authenticated via headers/tokens only, might be worth either documenting that assumption here or tightening the origin policy.

Comment on lines 810 to 812
if err != nil {
log.Error("websocket accept failed", "err", err)
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On handshake failures, we only log and return. Consider also writing a response so non-WS clients get an actionable error.

Suggested change
if err != nil {
log.Error("websocket accept failed", "err", err)
return
log.Error("websocket accept failed", "err", err)
http.Error(w, "websocket upgrade failed", http.StatusBadRequest)
return

Comment on lines 969 to 977
exitMsg := attachControlMessage{Type: "exit", ExitCode: exitCode}
data, _ := json.Marshal(exitMsg)
select {
case writeCh <- wsWriteOp{msgType: websocket.MessageText, data: data}:
case <-done:
}
// Give a moment for the message to be sent before closing
time.Sleep(50 * time.Millisecond)
shutdown()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The time.Sleep(50 * time.Millisecond) here feels a bit racy (under load or slow clients the exit message can be dropped if the writer goroutine hasn’t flushed yet). Might be worth making the exit send deterministic (e.g., add an ack channel to wsWriteOp and wait for the write to complete / fail before shutting down).

Comment on lines 889 to 897
case "resize":
if ctrl.Rows > 0 && ctrl.Cols > 0 && ctrl.Rows <= 65535 && ctrl.Cols <= 65535 {
ws := &pty.Winsize{Rows: uint16(ctrl.Rows), Cols: uint16(ctrl.Cols)}
if err := pty.Setsize(h.ptyFile, ws); err != nil {
log.Error("pty resize failed", "err", err)
} else {
log.Debug("pty resized", "rows", ctrl.Rows, "cols", ctrl.Cols)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For invalid resize payloads we silently ignore. Even just a warning helps debug client-side issues.

Suggested change
case "resize":
if ctrl.Rows > 0 && ctrl.Cols > 0 && ctrl.Rows <= 65535 && ctrl.Cols <= 65535 {
ws := &pty.Winsize{Rows: uint16(ctrl.Rows), Cols: uint16(ctrl.Cols)}
if err := pty.Setsize(h.ptyFile, ws); err != nil {
log.Error("pty resize failed", "err", err)
} else {
log.Debug("pty resized", "rows", ctrl.Rows, "cols", ctrl.Cols)
}
}
case "resize":
if ctrl.Rows > 0 && ctrl.Cols > 0 && ctrl.Rows <= 65535 && ctrl.Cols <= 65535 {
ws := &pty.Winsize{Rows: uint16(ctrl.Rows), Cols: uint16(ctrl.Cols)}
if err := pty.Setsize(h.ptyFile, ws); err != nil {
log.Error("pty resize failed", "err", err)
} else {
log.Debug("pty resized", "rows", ctrl.Rows, "cols", ctrl.Cols)
}
} else {
log.Warn("invalid resize message", "rows", ctrl.Rows, "cols", ctrl.Cols)
}

shutdown()
case <-done:
}
}()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exit code message never sent to WebSocket client

High Severity

The process exit watcher waits on h.doneCh to send the exit code, but h.doneCh is closed after h.ptyFile.Close() in the ProcessSpawn waiter goroutine. When the PTY file is closed, the PTY reader goroutine immediately receives EIO and calls shutdown(), closing the done channel. The process exit watcher's outer select then sees done is closed and takes the case <-done: branch, exiting without ever sending the exit code message. Clients will never receive exit codes when processes terminate normally.

Additional Locations (1)

Fix in Cursor Fix in Web

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exit code message never sent to WebSocket client

High Severity

The process exit watcher waits on h.doneCh to send the exit code, but h.doneCh is closed after h.ptyFile.Close() in the ProcessSpawn waiter goroutine. When the PTY file is closed, the PTY reader goroutine immediately receives EIO and calls shutdown(), closing the done channel. The process exit watcher's outer select then sees done is closed and takes the case <-done: branch, exiting without ever sending the exit code message. Clients will never receive exit codes when processes terminate normally.

Additional Locations (1)

Fix in Cursor Fix in Web

It looks like this is actually an issue? We need to send a proper closed signal for client to distinguish normal error and disconnection?

- Move AttachControlMessage to lib/ptyio for SDK consumption
- Add AttachMessageType enum with Resize, Exit, Error constants
- Extract ReadPTYToWriter helper to eliminate duplication
- Add PollTimeoutMs and MaxTerminalDimension constants
- Remove dead code (HandleProcessAttach, copyPTYToConn)
- Add warning log when resize dimensions are out of range
- Truncate invalid JSON in error logs to avoid spam
- Add comment explaining context.Background() usage

// Close WebSocket gracefully
wsConn.Close(websocket.StatusNormalClosure, "")
log.Info("websocket attach ended", "process_id", id)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race condition allows concurrent PTY attach sessions

Medium Severity

The handler clears attachActive via defer when it returns, but doesn't wait for spawned goroutines to exit first. After <-done unblocks, the handler returns immediately while the PTY reader goroutine may still be running for up to 100ms (due to PollTimeoutMs in ReadPTYToWriter). During this window, a new client can attach since attachActive is cleared, resulting in two goroutines concurrently reading from the same PTY file. This can cause terminal output to be interleaved or lost between sessions. A sync.WaitGroup is needed to synchronize goroutine completion before clearing the attach flag.

Additional Locations (1)

Fix in Cursor Fix in Web

- Fix exit code never sent: restructure so exit watcher sends message
  before triggering shutdown, then waits for writer to drain
- Fix writer dropping pending messages: drain writeCh after done is
  closed before exiting writer goroutine
- Fix race on attachActive: add WaitGroup to wait for all goroutines
  to complete before clearing the flag
- Fix pre-upgrade errors returning text/plain: add writeJSON helper
  that sets correct Content-Type
- Fix WebSocket upgrade failure not sending response: add http.Error
  fallback for non-WS clients
- Add comment explaining OriginPatterns ["*"] is safe because endpoint
  uses token auth, not cookies
@rgarcia rgarcia requested a review from hiroTamada January 12, 2026 23:11
@hiroTamada
Copy link
Contributor

Question: This migrates from HTTP hijack to WebSocket, which appears to be a breaking change for existing clients—they'd need to implement WebSocket upgrade handshake and the new message framing protocol (binary for data, JSON text messages for control/resize/exit).

Could you confirm:

  1. Is this a breaking change for existing consumers of the /process/{id}/attach endpoint?
  2. If so, are there known clients that need coordinated updates?
  3. What's the migration plan (e.g., client-side PR, documentation, changelog entry)?

The design doc mentions "Need to check if there are any clients using the attach endpoint" as an open question—wanted to make sure this is resolved before merge.

@rgarcia
Copy link
Contributor Author

rgarcia commented Jan 13, 2026 via email

Copy link
Contributor

@hiroTamada hiroTamada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exit code never sent due to race condition comment from cursor seems like a real issue. Other than that, LGTM

@rgarcia
Copy link
Contributor Author

rgarcia commented Jan 13, 2026

The cursor bot comment is referencing the first commit (9a87ed2), but we addressed this in the follow-up commit (a185e48). The fix restructures the exit watcher to send the exit message before calling shutdown(), then waits on <-writerDone to ensure the message actually gets delivered before we close everything down. You can see the updated logic at lines 829-854.

@rgarcia rgarcia merged commit 0910873 into main Jan 13, 2026
4 of 5 checks passed
@rgarcia rgarcia deleted the rgarcia/websocket-tty-attach branch January 13, 2026 02:44
@hiroTamada
Copy link
Contributor

The cursor bot comment is referencing the first commit (9a87ed2), but we addressed this in the follow-up commit (a185e48). The fix restructures the exit watcher to send the exit message before calling shutdown(), then waits on <-writerDone to ensure the message actually gets delivered before we close everything down. You can see the updated logic at lines 829-854.

sounds good. my pr review prompt was looking at the older commit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants